Identification of Most Frequently Occurring Lexis in Body-enhancement Medicinal Unsolicited Bulk e-mails

نویسندگان

  • Jatinderkumar R. Saini
  • Apurva A. Desai
چکیده

e-mail has become an important means of electronic communication but the viability of its usage is marred by Unsolicited Bulk e-mail (UBE) messages. UBE consists of many types like pornographic, virus infected and 'cry-for-help' messages as well as fake and fraudulent offers for jobs, winnings and medicines. UBE poses technical and socio-economic challenges to usage of e-mails. To meet this challenge and combat this menace, we need to understand UBE. Towards this end, the current paper presents a content-based textual analysis of more than 2700 body enhancement medicinal UBE. Technically, this is an application of Text Parsing and Tokenization for an un-structured textual document and we approach it using Bag Of Words (BOW) and Vector Space Document Model techniques. We have attempted to identify the most frequently occurring lexis in the UBE documents that advertise various products for body enhancement. The analysis of such top 100 lexis is also presented. We exhibit the relationship between occurrence of a word from the identified lexis-set in the given UBE and the probability that the given UBE will be the one advertising for fake medicinal product. To the best of our knowledge and survey of related literature, this is the first formal attempt for identification of most frequently occurring lexis in such UBE by its textual analysis. Finally, this is a sincere attempt to bring about alertness against and mitigate the threat of such luring but fake UBE. Keywords—Body Enhancement, Lexis, Medicinal, Unsolicited Bulk e-mail (UBE), Vector Space Document Model, Viagra

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of Non-Lexicon Non-Slang Unigrams in Body-enhancement Medicinal UBE

Email has become a fast and cheap means of online communication. The main threat to email is Unsolicited Bulk Email (UBE), commonly called spam email. The current work aims at identification of unigrams in more than 2700 UBE that advertise body-enhancement drugs. The identification is based on the requirement that the unigram is neither present in dictionary, nor is a slang term. The motives of...

متن کامل

Filtering Spam by Using Factors Hyperbolic Trees

Most of current Anti-spam techniques, like the Bayesian anti-spam algorithm, primarily use lexical matching for filtering unsolicited bulk E-mails (UBE) and unsolicited commercial E-mails (UCE). However, precision of spam filtering is usually low when the lexical matching algorithms are used in real dynamic environments. For example, an E-mail of refrigerator advertisements is useful for most f...

متن کامل

A proposed solution for addressing the challenge of patient cries for help through an analysis of unsolicited electronic mail.

BACKGROUND Unsolicited electronic mail (e-mail) is e-mail sent to a physician from a person unknown to the physician, who is seeking professional help. The purpose of this project was to analyze unsolicited e-mails sent to a digital textbook author to: 1) characterize the e-mails, 2) determine what resources would be necessary to answer the e-mails, and 3) propose a standard approach to reply t...

متن کامل

Unsolicited E-mails to Forensic Psychiatrists.

E-mail communication is pervasive. Since many forensic psychiatrists have their e-mail addresses available online (either on personal websites, university websites, or articles they have authored), they are likely to receive unsolicited e-mails. Although there is an emerging body of literature about exchanging e-mail with patients, there is little guidance about how to respond to e-mails from n...

متن کامل

Sender and Receiver Addresses as Cues for Anti-Spam Filtering

This study analysed the sender and receiver addresses of 3,417 unsolicited e-mails. Over 60.3% of unsolicited e-mails were found to have an invalid sender address and 92.8% receiver addresses did not appear in the “To” or “CC” headers. The analytical results indicated that e-mail addresses in the header could provide a cue for filtering junk e-mails.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012